Suboptimality  of  Nonlocal  Means 
for  Images  with  Sharp  Edges 


Arian  Maleki1’*,  Manjari  Narayan2,  Richard  G.  Baraniuk3’** 


Dept,  of  Computer  and  Electrical  Engineering,  Rice  University,  MS-380 
6100  Main  Street,  Houston,  TX  77005,  USA 


Abstract 

We  conduct  an  asymptotic  risk  analysis  of  the  nonlocal  means  image  denot¬ 
ing  algorithm  for  the  Horizon  class  of  images  that  are  piecewise  constant  with 
a  sharp  edge  discontinuity.  We  prove  that  the  mean  square  risk  of  an  opti¬ 
mally  tuned  nonlocal  means  algorithm  decays  according  to  n~l  log1,/2+e  n,  for 
an  n- pixel  image  with  t  >  0.  This  decay  rate  is  an  improvement  over  some 
of  the  predecessors  of  this  algorithm,  including  the  linear  convolution  filter, 
median  filter,  and  the  SUSAN  filter,  each  of  which  provides  a  rate  of  only 
n-2/3.  It  is  also  within  a  logarithmic  factor  from  optimally  tuned  wavelet 
thresholding.  However,  it  is  still  substantially  lower  than  the  the  optimal 
minimax  rate  of  n-4^3. 
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1.  Introduction 


1.1.  Image  denoising 

The  long  history  of  image  denoising  is  testimony  to  its  central  impor¬ 
tance  in  image  processing.  A  wide  range  of  algorithms  have  been  developed, 
ranging  from  simple  linear  convolution  and  median  filtering  to  total  variation 
denoising  [1]  and  sparsity  exploiting  algorithms  such  as  wavelet  shrinkage  [2]. 
Due  to  the  sensitivity  of  human  visual  system  to  edges,  the  ability  to  preserve 
sharp  edges  is  an  important  criterion  for  noise  removal  algorithms.  Therefore 
Korostelev  and  Tsybakov  proposed  a  framework  to  characterize  the  perfor¬ 
mance  of  image  denoisers  on  edges  [3] .  Based  on  this  framework,  we  aim  to 
characterize  the  performance  of  several  denoising  algorithms  that  represent 
the  current  state  of  the  art  image  enhancement  techniques.  In  particular,  we 
will  focus  on  the  popular  and  powerful  nonlocal  means  (NLM)  algorithm. 


1.2.  The  minimax  framework 

In  this  paper,  we  are  interested  in  estimating  a  function  /  :  [0, 1]"  — >  R 
from  noisy  pixel  level  observations.  Define  Pixel(i,j)  —  [-,— )  x  [A  — ), 
and  let  xitj  —  A ve(/  |  Pixel (i.  j))  be  the  pixel  level  averages  of  /.  We  observe 
the  samples 

Vi.j  —  %i,j  T  Zij , 

where,  Zij  is  iid  N(0,  a2).  The  goal  is  to  recover  the  original  pixel  values  xi:J 
from  the  observations  y,j .  based  on  some  information  about  the  function  /. 
For  a  given  function  /  and  an  estimator  /  we  define  the  risk  Junction  as 


Rn(fJ)  =  E 


The  risk  can  also  be  written  as 


Rn(fJ)  = 


(1) 


(2) 

where  the  first  and  second  terms  correspond  to  the  bias  and  variance  of  the 
estimator  /,  respectively. 

Let  /  belong  to  a  class  of  functions  J~.  e.g.,  a  class  of  edge-like  images 
that  represent  edges  with  different  shapes  and  orientations.  The  risk  defined 
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in  (1)  depends  on  the  specific  choice  of  /.  We  define  the  risk  of  an  estimator 
f  on  the  class  JF  as  the  risk  of  the  worst-case  signal,  i.e., 

Rn(T,.f)  =  sup  Rr{fJ). 

feT 

The  minimax  risk  over  functions  in  T  is  then  defined  as  the  risk  of  the  best 
possible  estimator,  i.e., 

K(F)  =  inf  sup  Rn (/,/). 

/  feF 

The  minimax  risk  is  a  lower  bound  for  the  performance  of  all  measurable 
estimators  for  signals  in  J-. 

In  this  paper  we  are  interested  in  the  asymptotic  setting  where  the  number 
of  pixels  n  — >  oc.  For  all  of  the  estimators  we  consider,  Rn  (  J- .  f)  — >  0  as 
n  — >  oc.  Therefore,  we  consider  the  decay  rate  of  the  risk  as  the  performance 
measure.  We  will  derive  the  minimax  risk  for  several  popular  image  denoising 
techniques  below. 

We  will  use  the  following  asymptotic  notation  in  this  paper. 

Definition  1.  f(n)  —  0(g(n))  as  n  oc,  if  and  only  if  there  exist  n0  and 
c  such  that  for  any  n  >  n0,  |/(n)|  <  c\g(n)\.  Likewise,  f(n)  —  f l(g(n ))  as 
n  — >  oo,  if  and  only  if  there  exist  n0  and  c  such  that  for  any  n  >  n0,  |/(n)  |  > 
c\g(n)\.  Finally,  f(n )  =  0(<y(n)),  if  f(n)  =  0(g(n))  and  f(n)  =  Ll{g(n)). 
We  may  interchangeably  use  f(n)  x  g(n)  for  f(n)  —  G(g(n)). 

Definition  2.  f(n)  —  o(g(n))  if  and  only  if  lim^oo  =  0. 

1.3.  Horizon  edge  model 

Several  different  image  edge  models  have  been  developed  in  the  image 
processing  and  denoising  literature.  Here  we  will  use  the  Horizon  model 
that  contains  piecewise  constant  images  with  edges  that  are  smooth  in  the 
direction  of  the  edge  contour  but  discontinuous  in  the  direction  orthogonal 
to  the  edge  contour  [3,  4].  Specifically,  let  Holder0 (C)  be  the  class  of  Holder 
smooth  functions  on  R,  defined  as  follows:  h  <G  H older* (C)  if  and  only  if 

|/iw(t1)-/i(fe)(t,1)|  <  C\h  -  t\\a~k , 

where  k  =  [oj .  Given  a  one-dimensional  smooth  edge  contour  function  h, 
we  define  the  image  fh  :  [0,  l]2  ->  R  as  fh{tl,t2)  =  1  {t2<h(ti)b  where  is 
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t2 


Figure  1:  An  example  of  a  Horizon  function,  a  piecewise  constant  image  containing  an 
edge  that  is  Holder °  smooth  in  the  direction  of  the  edge  contour  but  discontinuous  in  the 
direction  orthogonal  to  the  edge  contour. 


the  indicator  function.  Based  on  this  construction,  we  define  the  Horizon 
class  of  functions  as 

Ha{C )  =  {fh(tu  t2)  :  h  e  H  older*  (C)  D  Holder1  (1)},  (3) 

where  a  is  the  smoothness  of  the  edge  contour.  Figure  1  plots  a  representative 
function  from  this  class. 

The  following  theorem,  proved  in  [3],  specifies  the  minimax  risk  of  the 
class  of  all  measurable  estimators  on  H*(C). 

Theorem  1.  [3]  For  a  >  1,  the  minimax  risk  of  the  class  HQ(C)  is 

R:(Ha(C))xn^.  (4) 

We  are  particularly  interested  in  the  case  a  =  2  edges,  for  which  the 
optimal  rate  is  n~F:f  The  rate  provided  in  the  above  theorem  is  the  Holy 
Grail  of  image  denoising  algorithms. 

1-4-  A  menagerie  of  denoising  algorithms 

We  will  perform  a  minimax  risk  analysis  of  not  just  nonlocal  means  but 
a  number  of  other  popular  image  denoising  algorithms. 
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1 . 4  ■  1  ■  Linear  filtering 

The  classical  denoising  method  is  the  linear  convolution  filter,  which  es¬ 
timates  the  image  via 

■fg  ifij)  ~  9m,eyi-m,j-e,  (5) 

m  e 

where  g  is  a  two  dimensional  filter  impulse  response  that  satisfies  g,j  — 

1.4  When  all  the  weights  gVJ  are  equal,  the  algorithm  is  called  the  running 
average  or  the  box  filter.  Most  of  the  linear  filters  used  in  practice  are  sym¬ 
metrical  and  approximately  isotropic. 

Definition  3.  Let  g  be  a  real  and  symmetric  filter  response,  i.e.,  ghJ  =  g-ij  — 
gi-j,  and  let  G(u)\,  uif)  represent  its  two-dimensional  Fourier  transform.  The 
filter  is  isotropic  if  and  only  if  there  exists  a  function  F  :  IR  — >  C,  such  that 

G{oJi,UJ2)  -  F  V  — 7T  <  a>i,W2  <  7T. 

Isotropic  filters  are  popular,  because  they  treat  image  features  similarly 
regardless  of  their  directions.  Let  grad(-)  be  the  gradient  operator.  The 
following  theorem,  proved  in  Section  4.1,  provides  the  decay  rate  of  the  risk 
of  linear  convolution. 

Theorem  2.  Consider  the  linear  convolution  filter  (5)  and  suppose 
that  g  is  real,  symmetric,  and  isotropic.  Furthermore,  assume  that 
1 1  grad  ( C?  ( ro’i ,  ) )  1 1 2  £  C  for  a  fixed  constant  C.  Then, 

inf  sup  Rnifjg1')  x  n“i. 

9  feH°(C) 

Castro  and  Donoho  [5]  have  proved  a  similar  result  for  the  special  case  of 
the  box  filter.  While  the  Horizon  model  used  in  [5]  is  slightly  different  from 
our  model,  their  proof  works  in  our  setting  as  well. 

1.4.2.  Yaroslavsky  /  SUSAN  filter 

While  linear  filters  are  popular  in  image  processing  due  to  their  simplicity, 
they  unfortunately  blur  images  with  sharp  edges.  One  popular  alternative  is 

4For  simplicity  of  analysis,  we  use  a  periodic  extension  of  y  at  the  image  boundaries. 
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to  adapt  the  weight  of  each  pixel  in  the  average  (5)  according  to  the  distance 
between  its  noisy  value  and  the  value  of  the  pixel  we  aim  to  estimate.  Let 
Ctj‘  ~  {(m>  £)  I  *  -  An  <  m  <  i  +  A„,  j  -  An  <  i  <  j  +  A,,}  denote  the 
A„-neighborhood  of  the  pixel  (i,  j).  One  popular  approach  for  setting  the 
weights  is 

wYjim,  £)  =  e 

from  which  we  calculate  the  estimate 


(Vtn,£-Vi,j> 

2 r2 


/I„,r(bj)  = 


EL+=t-A„E2f-A 


(6) 


Only  the  pixels  in  the  A, , -neighborhood  of  (i,j)  contribute  to  the  estimate 
of  that  pixel.  This  algorithm  is  called  the  Yaroslavsky  Filter  (YF)  or  SUSAN 
filter  [6.  7];  slight  modifications  are  known  as  the  bilateral  filter  [8]  and  a -filter 
[9], 

To  calculate  the  risk  of  the  YF,  we  consider  a  slightly  different,  oracle- 
based  algorithm.  Suppose  that  in  setting  the  weights  wj-(rn ,  £)  of  the  YF  we 
have  access  to  the  actual  (and  not  the  noisy)  value  of  the  pixel  (i,j).  Using 
this  oracle  information  we  can  set  the  weights  according  to 


w. 


sJ(mJ)  =  e 


(ym.t 


Intuitively,  the  oracle  weights  are  "less  noisy”  than  the  actual  filter  weights. 
Plugging  these  weights  into  (6)  we  obtain  what  we  call  the  semi-oracle 
Yaroslavsky  filter  (SYF).  The  following  theorem,  proved  in  Section  4.2,  shows 
that,  as  far  as  the  decay  rate  is  concerned,  the  SYF's  performance  is  the  same 
as  the  linear  filter  and  box  filter. 


Theorem  3.  The  risk  of  SYF  algorithm  satisfies 

inf  sup  RMJSr)  =  n(n-V3). 

T.An  feHa(C) 

1-4-3.  Sparsity  based  denoising 

Another  popular  class  of  image  denoising  methods  exploit  sparsity  in 
some  transform  domain  via  thresholding.  Wavelets  are  often  used  as  the 
sparsity  domain  for  natural  images.  Let  W(y)  represent  the  separable  two- 
dimensional  wavelet  transform  of  the  image,  let  IW  represent  the  inverse 
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wavelet  transform,  and  let  T  be  the  hard  thresholding  function,  i.e.,  Tg(x)  — 
.X'l{|x|>0j.  Then  wavelet  thresholding  denoising  corresponds  to 

ff  =lW(Tomy))). 

Donoho  and  Johnstone  have  proven  that  sup/e//,V(C)  Rn(f,  /"  )  -  Q(n-1)  [4], 
[10],  Even  though  this  rate  is  an  improvement  over  the  above  algorithms,  is 
still  far  from  the  optimal  achievable  rate  of  n~3  for  a  =  2. 

This  suboptimality  spurred  the  development  of  other  sparsity- inducing 
transformations,  including  curvelets  [11],  wedgelets  [4],  shearlets  [12],  and 
contourlets  [13],  Among  these  transforms,  wedgelet  denoising  provably  achieves 
the  optimal  rate  of  n~3  for  a  =  2  [4],  However,  wedgelet  denoising  performs 
poorly  on  textures,  which  has  limited  its  application  in  practice  to  date. 


2.  Nonlocal  means  denoising 

The  YF  estimator  sets  its  weights  according  the  noisy  pixel  values  and 
their  spatial  vicinity;  however  neither  of  these  two  features  are  reliable  for 
noisy,  edgy  images.  In  contrast,  the  nonlocal  m,eans  (NLM)  algorithm  sets 
its  weights  according  to  the  proximity  of  the  image  patch  surrounding  each 
noisy  pixel  with  other  patches  in  the  image  [14].  Define  the  ^-neighborhood 
distance  d$n  (yij,  ym,i)  between  two  observations  as 
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(VijiVnip)  ~  2  ^  ^  ^  | yi+e,j+m  ?Ai+£,p+m|  I !Ji,j  Vn,p\ 

rn 


m=—Sn  £=—S„ 


where  p\  =  (26 n  + 1)2  —  1.  Note  that,  in  contrast  to  the  definition  in  [14],  we 
have  removed  the  center  element  | yitj  —  yn,p |2  from  the  summation.  Since  we 
assume  that  Sn  —>  oc  as  n  —>  og,  the  effect  is  negligible  on  the  asymptotic 
performance.  But,  as  we  will  see  in  Section  4,  removing  the  center  element 
simplifies  the  calculations  considerably.  NLM  uses  the  neighborhood  dis¬ 
tances  to  estimate 

hj  ’  u 

where  S  —  {1,2,...,  n.}  x  {1, 2, ....  n}  and  Wij(m,  t)  is  set  according  to 
the  (^-neighborhood  distance  between  y^j  and  ym/.  For  the  simplicity  of 
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notation,  in  cases  where  both  the  reference  pixel  (i,  j)  and  the  algorithm 
are  obvious  from  the  context,  we  will  omit  the  superscript  and  subscript  of 
the  weight  and  use  the  simplified  notation  wm/  instead  of  wh-fm.f.).  It  is 
straightforward  to  verify  that  E(ri2  {y,j,  ym,e))  —  d2  .xmy)  +  2a2,  which 
suggests  the  following  strategy  for  setting  the  weights: 


,n  p\ f  1  if  dt  (Vij,  Vm, e)  <  2a2  +  tn, 

w  ;  \  0  otherwise 


where  tn  is  the  threshold  parameter.  Soft/tapered  versions  of  setting  the 
weights  have  been  explored  and  are  often  used  in  practice  [14].  However, 
the  above  untapered  weights  capture  the  essesnse  of  the  algorithm  while 
simplifying  the  analysis.  We  postpone  the  discussion  of  tapered  weights 
until  Section  5. 

There  are  two  main  differences  between  the  NLM  and  YF  algorithms. 
First,  the  pixels  that  contribute  in  the  NLM  averaging  are  not  necessarily  in 
the  local  neighborhood  of  the  reference  pixel  (hence  the  monicker  “nonlocal”). 
Second,  the  NLM  weights  depend  not  on  the  difference  between  the  pixel 
values  but  on  distance  between  the  pixel  neighborhoods.  In  other  words  the 
pixel  neighborhood  is  even  more  important  than  the  pixel  value. 

To  derive  a  lower  bound  for  the  risk  of  NLM,  we  will  analyze  two  algo¬ 
rithms  that  set  the  weights  using  some  degree  of  oracle  information  regard¬ 
ing  the  true  value  of  the  signal.  The  full  oracle  NLM  (FNLM)  has  access  to 
E(d|  (yitj,  y„i,e))  in  setting  the  weights  wmj  in  (7)  and  thus  sets  them  using 
the  noise-free  values  of  the  pixels 


1  if  dfin(xi!j,xrritg)  <  tn  , 
0  otherwise. 


(9) 


The  semi-oracle  NLM  (SNLM)  differs  only  slightly  from  the  standard  NLM 
in  that  it  uses  the  semi-oracle  neighborhood  distance 
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d-5n  (ViJ  >  Vn,p)  —  (  'y  ]  'y  \xi+e,j+m  —  yn+e,p+m\ 

Pn  \m=—5n  l=—8n 


(10) 


and  then  sets  the  weights  in  (7)  according  to 


wfj(m,  £) 


1  if  <%n(yi,j,ym,e)  <  v2 +  tn<, 
0  otherwise. 


(11) 


Unlike  FNLM,  SNLM  assumes  that  just  one-half  of  the  noise  is  removed  from 
the  distance  estimates.  Therefore,  the  distances  calculated  in  the  SNLM  are 
more  accurate  than  the  standard  NLM  but  less  accurate  than  in  the  FNLM. 
In  the  rest  of  the  paper,  we  will  use  fN,  fs,  and  fF  to  denote  the  NLM, 
SNLM,  and  FNLM  estimators,  respectively. 


3.  Main  Results 


Our  first  result,  proved  in  Section  4.3,  establishes  an  upper  bound  on  the 
risk  of  NLM. 

Theorem  4.  Fix  e  >  0  and  consider  NLM  denoising  with  Sn  =  2  log5+e  n 
and  tn  =  2T  .  The  risk  of  this  algorithm  over  the  class  Ha(C)  is 

log  2  n 


sup  R(fJN)  =  0 

f€Ha(C) 


(12) 


Before  we  discuss  the  implications  of  this  theorem,  it  is  important  to 
note  that,  while  we  can  improve  the  decay  rate  as  close  as  we  desire  to 
0(n-1log2  n),  the  constants  that  are  involved  in  the  big-0  notation  grow  as 
e  decreases.  Therefore,  in  practice  very  small  values  of  e  are  not  desirable. 

Comparing  the  upper  bound  (12)  with  the  optimal  minimax  risk  (4)  in¬ 
dicates  that  NLM  is  suboptimal  for  a  >  1.  In  other  words,  NLM  cannnot 
exploit  the  smoothness  of  edge  contours  in  images. 

The  bound  in  Theorem  4  is  for  a  specific  choice  of  parameters,  and  it  is 
natural  to  ask  whether  NLM  can  achieve  the  optimal  rate  with  some  other 
choice  of  parameters.  To  answer  this  question,  we  consider  SNLM,  which 
outperforms  standard  NLM  in  general.  We  make  the  following  mild  assump¬ 
tions: 


Al:  The  window  size  Sn  — >  oo  as  n  — >  oo.  This  assumption  is  critical  to 
ensuring  good  performance  of  any  NLM  estimator. 

A2:  The  threshold  is  set  to  a2  +  tn  as  explained  in  (11)  with  tn  >  0.  This 
ensures  that  if  the  neighborhood  of  pixel  (m,  l)  is  exactly  the  same  as 
the  neighborhood  of  pixel  then  wm^  —  1  with  high  probability. 

A3:  The  threshold  tn  is  set  such  that,  if  the  noise-free  neighborhoods  are 
different  in  more  than  half  of  their  pixels,  i.e.,  if  d2(xitj,  xmj)  >  then 
P  {w[j{mJ)  =  1)  =  o(n_1). 


9 


A4:  Sn  =  0(nP),  for  some  8  <  0.3. 

The  following  theorem  provides  a  lower  bound  on  the  performance  of 
SNLM. 

Theorem  5.  Suppose  that  Sn  and  tn  satisfy  Al  Af.  The  risk  of  the  SNLM 
over  the  class  Ha(C)  is 

inf  sup  R(fJs)=n(n-1). 

°n’tn  feH°(c) 

This  bound  is  still  suboptimal  compared  to  the  minimax  rate  for 

a  —  2.  In  the  words  of  John  Cornyn  III,  the  junior  United  States  Senator 
for  Texas,  “The  problem  with  a  mini-deal  is  we  have  a  maxi-problem”  [15]. 

Remarkably,  this  lower  bound  is  achieved  on  a  very  simple  image  on 
which  NLM  would  be  assumed  to  work  very  well:  lp2<o.5}  (see  Figure  2). 
Here  is  what  goes  wrong.  Consider  the  estimation  of  an  “edge”  pixel  (i,j) 
that  satisfies  j  —  [n/?,(^)j.  Define  the  set  J  —  {(rn,  T)  \  t  —  |_n/i(^)J}  as 
the  set  of  pixels  just  below  the  edge.  We  will  prove  the  probability  that  a 
pixel  in  J  contributes  to  the  NLM  estimate  (whj(m.  L)  —  1)  is  larger  than 
po,  where  p0  does  not  depend  on  n.  This  happens  due  to  the  low  “signal  to 
noise  ratio”  in  the  distance  estimates.  Hence  0(n)  pixels  of  .7  will  contribute 
to  the  NLM  estimate.  Since  these  pixels  have  xm>(  —  1,  they  introduce  a 
large  bias  in  the  estimate.  In  fact,  we  show  below  that  the  bias,  as  defined  in 
(2),  will  be  larger  than  .  Here  npo  corresponds  to  the  pixels  below 

the  edge  that  pass  the  threshold.  This  shows  that  the  bias  is  clearly  0(1). 
Since  there  are  n  edge  pixels,  the  risk  of  the  estimator  over  the  entire  image 
is  fl(n_1). 

4.  Proofs  of  the  Main  Theorems 

4-1.  Proof  of  Theorem  2 

The  proof  has  two  main  steps.  The  first  step  is  to  prove  that  there  exists 
a  linear  filter  for  which  the  supremum  risk  is  upper  bounded  by  0(n“2/:i). 
For  this  step  we  use  Theorem  3.1  and  3.2  from  [5],  which  establish  the  same 
upper  bound  for  the  box  filter.  The  second  and  more  challenging  step  is  to 
prove  that  no  other  linear  filter  can  improve  on  this  decay  rate.  The  rest  of 
this  section  is  dedicated  to  the  proof  of  this  fact. 
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Figure  2:  The  simple  image  Horizon  lp2<o,5}  used  for  proving  the  various  lower  bounds. 


Consider  the  function  fh(ti,  t2)  for  h(t)  —  |  and  suppose  that  n  is  even. 


This  function  is  displayed  in  Figure  2.  Let 


1 


X(ku  k2)  =  7 x<iAe" 
h  <2 


:’-trklel  j2*k2e2 
'  n  0  J  n 


n 


represent  the  Discrete  Fourier  Transform  (DFT)  of  a  discrete  two-dimensional 
discrete  signal  x.  Since  y  =  x  +  z,  the  DFT  of  fFF  equals 

FgF  (k\,k2)  —  Y(ki,k2)  ■  G(k\,  k2)  —  X(ki,k2)G(ki,k2)  +  Z(ki,k2)G(ki.k2), 

where  Z  is  again  iid  N( 0,  a2).  For  fh(t i,t2)  with  h(t\)  =  X(k\,  k2)  satisfies 

f  0  if  h  ±  0, 

*(*!’ k 2)  =\  if  ki  =  0.  (13) 

f  1— e  J  n 


It  is  easy  to  see  that  Rn(f,fgF)  =  ^E(||X  -  Fff\\2f),  where  ||y||J.  = 

'f2kl  k,  \Y(ki,k2)\2.  If  we  define  13(f)  as  the  bias  of  the  estimator  /,  then 
we  have 


> 


1  £  1  —  G(0,  k2' ,0  1 


n 


tr 


l<fc-2<fi,  odd 


57  |1-G(0,*2) 


sin 

2 


2  7rfc  -2 
n 

1 


l<k2<r 2/3,  odd 


sin 


2  7 ~k2  ' 
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The  variance  of  the  estimator  is 


Var ifgF)  =  \G{h,k2)\2a2. 

/  L 

ki,k2 


We  know  that 

ki  ,k2 


\G(lo'i,uj2)\2  +  0{n  l), 


(14) 


where  G  is  the  continuous  Fourier  transform  of  g  and  satisfies  ||grad(C?)  || 2  < 
C.  Since  g  is  isotropic,  there  exists  F  :  IR  — >  C  such  that 


G(cai,ca2)  =  F 


Changing  the  variables  of  integration  in  (14)  to  polar  coordinate  radius  uir  — 
uij  +  eel  angle  9 ,  we  have 

p2ir  p2n 

\G(uji,lu'2)\2  >  2tt  /  r\F(r)\2dr  —  2n  /  w2|G(0,a;2)|2<ia;2.  (15) 

J  r= 0  J  ui2=0 

Combining  (14)  and  (15)  we  have 

Var(/,fF)  =  \  \G(h,k2)\2(72  =  fc2|C?(0,  A;2)|  V  -  0(n_1). 

/  /  /  / 

A*i  ,A*2  Af2 


Summing  the  lower  bounds  for  the  bias  and  variance  of  this  estimator, 
we  obtain  the  following  lower  bound  for  the  risk  of  linear  filtering: 


Rn{f,  fLF)  =  B2(/f'F)  +  Var(/j 


>  — 
n2 


ns 


E  |l-G(0,fa) 

l<&2<n2A3,  odd 

E  |l-G(0,fa) 

l<k2<n2/3,  odd 


1 


Sill 


n 


2  nk2 


+ 


47T2 


nz 


J2k2\G{0,k2)\2a2-O{n-1] 


k2 


TT2k,2 


47T2 


n * 


^  fc2|G(0, k2)\2cr2  -  0(n  J). 


k‘2 


Minimizing  the  dominant  term  of  the  lower  bound  over  the  filter  weights 
provides  G*(0,  k2)  =  — 4J,g2fc:i  for  odd  values  of  k2  and  zero  for  even  values 

i-i — _ 2 
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of  k2.  To  find  a  lower  bound  we  calculate  the  bias  term  with  these  optimal 
weights: 


> 


nz 


77/ 


E  M(o,  *,)!*" 


I<fe2<n2/3,  odd 


E 


Air4cr2kl/n2  \2  n2 


l<k2<ri2/a ,  odd 


1  +  47rV2fc|/n2 


■K2k\ 


1  y 

Ti2  2-  \  1 


/  A.-K4o2k\j 


n 


n 


rf 


.  _  +  47T4CT2 

l<A*2<n2/3,  odd 

(  w  V  y 

\  1  +  47T4(72  / 

7  1</C2<n2/3,  odd 

47T4(T2  \  2  /  r)-2/3 


7T2A:2 


kt 


1  + 


47T4<T2  / 


40 


+  o(n"2'3)  )  . 


This  completes  the  proof. 


4-2.  Proof  of  Theorem  3 

In  this  section,  denote  the  pixel  to  be  estimated  as  xhj.  For  clarity  we 
use  the  notation  wm  (  instead  of  wfj  (m,  £).  We  first  characterize  some  of  the 
properties  of  the  SYF  weights. 


Lemma  1.  Suppose  that  Xij  —  0.  If  xmj  —  xhj,  then  F.(wm/ymte)  —  0. 

-1 

Furthermore,  if  \xitj  -  xmj\  =  1,  then  E (wm/ymie)  >  vJ+TZe2(a'+T2)  • 

Proof.  The  first  claim  is  clear  from  symmetry.  To  prove  the  second  claim, 
we  observe  that  E (wm<eynhe)  =  E (wm>exm>t)  +  E («v*W)-  Since  xm<e  = 
1,  we  calculate  E(wm,e)  and  ’E(wm>ezm>e).  It  is  clear  that  E (wm,ezm,e)  = 


(2m,£_1)2  zm,C 


-U=  r  zm  eG  2r'  >  0.  Therefore  we  calculate 

V  2tt  j  —  oo  > 


E(wm,d  - 


J  - 


<W  » 

2T2 


zLe 


0  2  r2  2(a  2+t2)tz 


(t2tt 

i 

T7T7T 


OO  ^2^.2 

e 

—  OO 


-cr  H-H  /  2  ia-  _  .  a- 

2o2t2  (2m,£  cr2  +  T22m/+(CT2+T2)2. 


g  2(02+r2) 


u 


ct2t2 


TQ  2(<t'2  +  7-2) 


(72 


\/(T2  + 
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This  completes  the  proof. 


□ 


Define  the  A-neighborhood  of  a  pixel  (m.  1)  as  C^lt  —  { (i,  j)  :  \i  —  m\  < 

A,  I  j  -  t\  <  A}  n  s. 


Lemma  2.  Let  Cln  =  (2A„  +  l)2. 


We  then  have 


f .  , 

f 

\ 

p 

^  Id 

Y,  wZ  - 

-  E 

V 

/ 

<  2e~2n',t2 


The  proof  is  a  simple  application  of  the  Hoeffding  inequality. 

Proof  of  Theorem  3.  The  first  claim  is  that  the  optimal  neighborhood  size 
satisfies  A„  —  Q(log  n).  We  prove  this  by  contradiction.  Suppose  that  A„  — 
0(log(n))  and  consider  the  performance  of  the  SYF  on  the  image  Xij  —  0 
for  every  (i,j).  It  is  clear  that  the  bias  is  zero.  However,  the  variance  is 
lower  bounded  by  II  ■  This  is  far  from  the  optimal  performance  of  the 

linear  filters  analyzed  in  Theorem  2.  Therefore  A„  =  D(log(n)). 

Now  consider  the  example  image  shown  in  Figure  2  with  fh{t\-  tf)  — 
lp2<o.5}-  For  notational  simplicity  we  assume  n  is  even  so  that  the  value  of 
each  pixel  is  either  0  or  1.  Define  the  two  regions  Pi  —  { (z,  j)  :  |  <  j  < 
Y  an(I  P2  =  {(b  j)  :  j  >  tj  +  An}.  At  least  1/4  of  the  pixels  in  the 
neighborhood  of  the  pixels  in  Pi  have  the  noise- free  value  of  1 .  All  pixels  in 
the  neighborhood  of  the  pixels  in  P2  have  the  noise-free  pixel  values  equal  to 
1.  Over  each  region  we  will  find  a  lower  bound  for  the  risk  of  SYF  and  then 
sum  them  to  obtain  a  lower  bound  for  the  risk  over  the  entire  image. 

Case  I  ( i,j )  <G  Pp  From  the  Jensen  inequality  we  have 


E 


S(m,£)ecfj*  wm,eym,e 
52(m,e)ec£f  wi,j 


> 


Define  the  following  two  constants: 

3- itj  —  0,Xm(  —  0), 

'•&i ,j  ~  0,  Xm  (  =  1). 


m0  =  E  (wfj(m,£) 

mi  =  {m,  £) 
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It  is  clear  that  m0  >  m\.  Let  the  event  A  be 


*■=<  E 

for  some  e  >  0.  We  have 

'^2{m,e)ecAn  wm,eym,e 


wm,e 


Eu,'my  <  A 


E 


Yl(m,e)ec*r  Wm’e 


>  E 


X)( m,e)eC AT*  Wm,eym,t 


(®)  (  S( m,e)ec A?“  wm,eym,e  \ 

-  {  4A>0  +  AJ-« 

^  /  Z) (m,£)ec*n  wm,ey,n,l 

-  H  ^  4A>,0  +  A2- 


S(m,£)ecf"  U!"‘4 


AJ  P(A) 
-  P(AC) 


(16) 


A 


Aj  P(A) 


Inequality  (a)  uses  Lemma  2  and  the  fact  that  m0  >  mj.  Inequality  (ft)  uses 
Lemma  1,  and  therefore  Cq  —  - 1 — — — .  Since  CAn  has  (2A„  +  l)2 

v/<r2+r2e  21^+T2) 

pixels,  at  least  A2  of  them  have  the  noise- free  pixel  value  1. 

Since  A„  =  12(logn),  Lemma  2  proves  that  P(AC)  =  o(l)  and,  therefore,  the 
bias  is  lower  bounded  by  0(1)  for  all  of  the  pixels  in  P\ . 

Case  II  (i,j)  G  P2'  As  mentioned  before,  all  pixels  in  the  neighborhood 
of  the  pixels  in  P2  have  the  noise- free  pixel  values  equal  to  1 .  Hence,  we  have 
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Defining  the  event  A  as  in  (16),  we  have 


E 


>  E 


>  E 


EMeCf»  'Wm,eZmf 
J2(m,£)ecfp  w,n<( 

Wm,£zm,£ 

4A>0V  A*- 

E(m/)eCf"  wm,£zm,£ 

4A  k0  +  A2- 


T  P(A) 


A  P(A) 


-  PMC1  —  _  pc  ac) 

P(A)-(4m0A2+A-)2  P(-4)- 


If  the  neighborhood  size  is  larger  than  clog(n)  for  some  constant  c,  then 
Lemma  2  will  imply  that  P(.4':)  <  o  (4j).  Therefore,  the  dominant  term  in 
the  above  expression  of  the  form  of  -fA.  Combining  the  lower  bounds  for  P\ 

and  P2 ,  we  obtain  a  lower  bound  of  the  form  of  Optimizing  over 

A„  proves  that 

inf  Rn(fJSY)>n(n-2l*). 

A  „,T 

This  completes  the  proof.  □ 

It  is  clear  from  the  proof  above  that  the  neighborhood  size  is  the  main 
parameter  that  controls  the  decay  rate  of  the  risk  of  the  YF.  The  Gaussian 
term  in  the  YF  weights  enables  an  improvement  in  the  constants  but  does 
not  play  any  role  in  the  decay  rate.  In  the  extreme  case  of  An  —  n,  when  all 
of  the  image  pixels  can  potentially  contribute  to  the  estimation  of  a  pixel, 
the  decay  rate  of  YF  degrades  to  0(1).  This  algorithm  is  called  the  range 
filter,  and  [8]  observed  in  practice  that  it  performs  much  worse  than  even 
linear  filters,  as  the  above  analysis  confirms.  Interestingly,  NLM  addresses 
this  issue  and  therefore  its  search  space  could  be  the  entire  image.  This  is 
the  main  reason  for  its  improved  performance. 

The  lower  bound  proved  in  Theorem  3  is  the  same  as  the  upper  bound 
we  derived  for  the  performance  of  linear  filtering.  Therefore,  we  have  the 
following  theorem. 

Theorem  6.  The  risk  of  the  SYF  satisfies 

inf  sup  fl„(/,/sy)xn-2/3. 

A  «’T/e//Q(C) 
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4-3.  Proof  of  Theorem  4 

The  proof  has  two  main  steps.  First,  we  show  that  the  risk  of  the  pixels 
far  from  the  edge  is  0(log1+2f(n)/n2).  Second,  we  show  that  the  risk  of  the 
pixels  whose  Sn  neighborhood  senses  the  edge  is  constant;  however  there  are 
at  most  0(nSn)  of  these  pixels.  The  following  two  lemmas  will  play  key  roles 
in  our  analysis. 

Lemma  3.  Let  Z  ~  N(0,a2).  For  \  <  ^ ,  we  have 


1 

x/1  -  2A<t2  ' 


Proof.  The  proof  is  a  simple  integral  calculation: 


□ 


Lemma  4.  Let  Z i,  Z2, . . . ,  Zn  be  iid  N(Q,  1)  random  variables.  The  y2  ran¬ 
dom  variable  defined  as  XaLi  Z?  concentrates  around  its  m,ean  with  high  prob¬ 
ability,  i.e., 


P 

P 


1  >  tj  <  e-5(t-ln(1+t)), 

1  <  -t  \  <  e-i^+M1-*)), 


Proof.  Here  we  prove  just  the  first  claim;  the  proof  of  the  second  claim  follows 
along  very  similar  lines.  From  Markov’s  Inequality,  we  have 


P 


1  >  t]  <  e 


— At— Ai 


(17) 
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2<5„ 

2<5„ 


Figure  3:  An  example  of  a  Horizon  image.  The  ^-neighborhood  of  pixel  ( ia,ja )  £  Si  does 
not  intersect  the  edge  contour,  while  the  -neighborhood  of  pixel  ( ib,jb )  £  S3  intersects 
with  the  edge  contour. 


The  last  inequality  follows  from  Lemma  3.  The  upper  bound  proved  above 
holds  for  any  A  <  f .  To  obtain  the  lowest  upper  bound  we  minimize  — - — 

2#iy  Pegging 

□ 


over  A.  The  optimal  value  of  A  is  A*  =  arg  min> 
A*  into  (17)  proves  the  result. 


Proof  of  Theorem  j.  We  will  consider  the  following  partition  of  the  image 
pixels.  Let  S  —  {1, 2, . . . ,  n}  x  {1,2,...,  n}.  For  a  given  Horizon  function 
fh{ti,t2),  define  =  {(i.j)  \  {  >  h(i)  +  2^},  S2  =  {( i,j )  |  h(i)  <  £  < 
Mj)  +  fh  =  {{ij)  I  h(i)  ~^<i<  h(i)},  and  S4  =  |  i  < 

h(-)  —  — }.  These  regions  are  displayed  in  Figure  3. The  ^-neighborhood  ol 
the  pixels  in  Si  and  S4  do  not  intersect  the  edge,  while  the  Sn -neighborhood 
of  the  other  pixels  may  have  pixels  from  both  sides  of  the  edge.  See  Figure 
3.  For  the  notational  simplicity  we  write  ^eS(  for  the  double  summation 
over  i.j  where  j  satisfies  the  constraints  specified  for  S(. 

Consider  a  pixel  (i.j)  G  Si .  The  risk  of  NLM  at  this  pixel  is 


E 


E  U-hnTy»i.t.  \ 

E  wrn,e  J 
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where  Xjj  —  0,  since  (i,  j)  G  S\.  Define  the  set  of  oracle  weights 


/l  ^  n  >  ), 

(  0  otherwise. 


(18) 


Define  U  =  ,  and  let  the  event  A  =  {wm^  =  w^e,  \/{m,£)  G 

Si  U  84}.  We  then  have 


E (U)  =  E (U  |  y4)P(A)  +  E(U  |  ,4C)P(TC)  <  E (U  |  A)P(yl)  +  P(AC),  (19) 


where  the  last  inequality  is  due  to  the  fact  that  the  risk  of  the  estimator  is 
bounded  by  1.  We  now  calculate  each  term  of  (19)  separately. 

Define  Su  —  S\  U  ,S'i  and  S23  =  S2  U  S3.  Then  we  have 


E(Lr  |  A)P(yl) 


=  E 


<  E 


<  E 


E 


E(m,£)eS14  w*m/ym/  +  E(m/)eS2 3  Wm,iym,e 
S(m,f)eSi4  wm,e  +  J2(m,e)es-23  Wm<e 

T,(m,e)es14  W*m,eym,e  +  E(m,)e523  Wm/ym ,1 
S(m,£)esi4  wm,e  +  E( m,e)es23  w,n>( 

EMe5M  Wrn,eXm,e  +  E(m,£)eS2 3  W’n,f:xm,( 
E(m,£)eSi4  +  S(m,£)eS23  Wm>e 
EM€S14  Wm,ezm,e  +  E(m,£)eS2 3  Wm,ezm,e 


E(m,f)e5i4  Wm,e  E(m/)eS23  Wm,e 


A 


A  P(A) 


E 


E(m,r)e5i4  Wrn,eXm>e  +  E(^)eS23  Wm,exm,£ 

E(m,£)eSi4  Wm,e  +  E(m,£)eS23  Wm,t 


X 


\ 


E 


EMeSiUS4  Wm,eZm,e  +  E(m,£)eS23  wm,ezm,e 
E(m,£)eSi4  +  E(m,£)e523 


(20) 


The  last  inequality  is  due  to  Cauchy-Schwartz.  In  the  next  two  lemmas  we 
bound  the  last  three  terms  of  (20). 
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Lemma  5.  Let  u;m>(  be  the  weights  of  NLM  with  Sn  —  log 2 +e  n  and  tn  = 
,  1 ,  for  e  >  0.  Also,  let  w*ie  be  the  oracle  weights  introduced  in  (18). 

y  log  2  n 


Then 


E 


E(m/)eSi4  Wm,eXm,e  +  EMe&3  Wm,£xm,£ 
^2(m,e)eSi4  wm,e  +  Yl(m,e)es2 3  Wrn<( 


=  () 


8l 


n 


Proof.  Define  Sf  as  the  set  of  indices  of  the  pixels  whose  noise- free  value  is 
neither  zero  nor  one.  Since  the  images  are  chosen  from  the  Horizon  class,  the 
cardinality  of  this  set  is  at  most  Tn.  Plugging  in  the  values  of  xm we  have 


E 


EMeSi4  Wrn,eXm,e  +  Z(m,£)eS23  W™£xm,£ 

E(m,«)eSi4  Wm,e  +  E(m,«)eS23  Wm’e 

Ka)  I  Z(m,e)es14  Wm,£Xm,£  +E  (m,£)eS3\Sf  Wm,£  +  Z(m,£)eSf  Wm,£xm,£ 

'(m,£)eSi4W™,t 


(b) 

<  E 


<  E 


Z(m,£)eSi4Wrn,£  +  Z(m,£)eS3\SfWm’e  +  Z(m,£)eS2\SfWm<e  +  Z(m,£)eSf  Wm£ 
Z(m,£)eS14  Wrn,£Xm,£  +  Z(m,£)eS3  *  +  Z(m,£)eSf  Wm£xm,£ 


E, 


m,£)eSi4  Wm,£ 


E, 


m,f)eS3 


E, 


EMeSi4  Wm,£Xm,£  +  E (m,£)eS3  1  +  2n 


m,£)eSf  Wm<e 
2 


E, 


m,i)eSi4  Wm,£ 


E(m,£)e53 


=  0 


'S' 

n2 


where  Inequality  (b)  is  due  to  the  fact  that  the  expression  after  Equality 
(a)  is  an  increasing  function  of  Z(m£)es3\sf  w",(  an<^  a  decreasing  function  of 
E (m,£)es2\sf  u,m£-  Therefore,  we  set  wm,e  =  1  for  (m,  £)  G  S3  and  wm,e  =  0 
for  (m,£)  e  £>2.  □ 


Lemma  6.  Lei  wm/  be  the  weights  of  NLM  with  Sn  —  log  2 +tn  and  tn  = 
,  2  ,  for  e  >  0.  .4 /.so.  /ei  to*(  e  be  the  oracle  weights  introduced  in  (18). 

ylog2  n 


Then  we  have 


E 


E(m/)eSi4  W*m,£Zm,£  +  E (m,£)eS23  Wm,£zm,£ 


=  0 


nz 


Z(m,£)eSi4  Wm,£  +  E(m,£)eS23  W"l't 
Proof.  Since  E(m  e)es23  wm,e  T  0  and  we  are  interested  in  the  upper  bound 


20 


of  the  risk,  we  can  remove  it  from  the  denominator  to  obtain 


E 


<  E 


=  E 


E(m,«)eSi4  Wm,eZm,e  +  J2(m,e)eS23  Wm,ezm, 
E(m/)eSi4  Wm,e  +  E(m,£)eS23  Wm,e 

EMeft4  Wm,eZm,e  +  E(m/)eS2 3  Wm,tzm,£ 


m,t)eSu  wm,e 


2E 


E 

E(m,£)eSi4  Wm,eZm,£ 

5Z(m,£)eS'i4  in. ( 

E(m/)eSi4  Wm,eZm,e 


E(m,£)eS2 3  W*n,ezm,e 


(m,e)eS14  ^'in,t 


E, 


m/)eSi4  Wm,£ 


E 

E(m,r)e6'23  Wm,tzm,l 

t; 


m,<)e5i4 


(21) 


Since  is  the  average  of  iid  random  variables,  it  is  not  hard  to 

£(m,06S14  «m,«  6 

prove  that  E  f  1  ^4—  1  =  O(^).  To  bound  the  other  two  terms  in 

1  V  E(m,£)6S14  K  n*  ’ 

(21)  we  use  the  notation  defined  in  the  last  section:  (  —  {( i,j )  :  \i  —  m\  < 

A,  |  j  —  i\  <  A}  D  S.  We  also  define  E(-  |  C^l  ()  as  the  conditional  expectation 
given  the  variables  in  C£e.  We  then  have 


E 


E(m/)eS23  wrn)f:zm,( 

E 


(m,£)eSi4  Wm,t 


=  E  E 


E(m/)eS23  wm,ezm,e 


m/)eSi4  wm,l 


=  E 


®(E(mV')€S23  E(m,£)eS23  Wm,ezm,e'wm' /' zm' /’  \  C,  'j) 


(E(m, 


e)eSi4  wm,e)2 


=  E 


^Ep^oec-l  E(m,£)es23  Wrn,eZm,eWrn',e,Zrn',e'  I  Cf 


(E(m,f)eSi4  <J2 
Efm'^OeC2^  E(m,£)es2 3  E (  U!m,(zm,(wm'  J/ zrn’ ) 


(E, 


m,^)eSi4 


</)! 


<0 


For  the  last  inequality  we  have  used  the  Cauchy-Schwartz  Inequality  to  prove 
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that  E (wrr,ezrn,ewm',e'Zrn'le')  <  3cr2.  Although  we  could  derive  a  loose  bound 

/  f  2n 
E(m,£)6S23  wm,tzm,lS' 


for  E 


and  still  draw  the  same  conclusion,  we  used  the 


E  (m,£)6*5j4  ^m,£ 

above  technique  since  we  have  to  use  it  in  the  proof  of  Theorem  7.  The  last 
term  we  have  to  bound  in  (21)  is 


E 


S(m,^)eSi4  Wm,eZm,e\  (  S(m,£)es2  3  Wm,ezm,e 


(m. 


()eSi4  Wm,f 


e)esi4  wm,t 


< 


E 


S(m,f?)eSi4  Wm,eZm,e 


i)eSi4  Wm,£ 


E 


S(m,qe52 3  Wm,ezm,e 


\  V  E(», 


(m,£)eSi4  wm,t 


<ou 


rr 


This  proves  the  lemma. 

Using  Lemma  5  and  Lemma  6  in  (20)  proves  that 


E(U\A)F{A)  =  0 


81 


n 


□ 


(22) 


Finally,  using  Lemma  4  and  the  union  bound  it  is  easy  to  show  that 


P(yT)  =  O 


1 


rr 


(23) 


It  is  important  to  note  that  the  constants  of  this  probability  are  hidden  in 
the  O  notation.  These  constants  depend  on  t  and  increase  as  e  decreases. 
Therefore,  we  cannot  set  e  =  0. 

Plugging  in  (23)  and  (22)  in  (19)  results  in 


E 


Xij 


ZWmSymA*  c(l  Og1+2e(n)\ 

wm,e  J  \  n2  J 


V(i,  j)  e  Si. 


Now  consider  (i.  j)  (E  S2  U  S3.  In  this  region  we  can  bound  the  error  by  the 
worst  possible  risk,  which  is  1.  We  will  discuss  the  sharpness  of  this  bound 
in  the  next  section  where  we  develop  a  lower  bound  for  the  risk. 

Using  the  bounds  provided  above  for  the  risks  of  the  pixels  in  Si,  S2,  S3 
and  S4,  we  can  now  calculate  the  final  upper  bound  for  the  risk  of  the  NLM 
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as 


sup  R(f,  fNL)  = 

feHa(C) 

< 


< 


jZZE(*V-%f 

*  3 

iog^>KM±lM  i  I&I  +  I& 

n2  n2 

^logt^nfj 


In  order  to  derive  the  last  inequality  we  noted  that  since  h(t\)  G  H older1  ( 1) 
the  cardinality  of  52  and  S3  are  0(n  log(n)).  This  completes  the  proof  of 
Theorem  4.  □ 

4-4-  Proof  of  Theorem  5 

Suppose  that  the  parameters  of  SNLM  satisfy  assumptions  A1-A4.  To 
derive  a  lower  bound  we  consider  the  performance  of  the  SNLM  algorithm 
on  the  simple  image  in  Figure  2.  For  notational  simplicity  we  assume  that  n 
is  even,  and  hence  all  of  the  pixel  values  are  either  0  or  1.  The  proof  follows 
four  main  steps: 

1.  We  consider  the  pixels  that  are  just  above  the  edge,  i.e. ,  (i,  [§]),  and 
prove  that  the  risk  of  the  NLM  on  these  pixels  is  lower  bounded  by  a 
constant  that  does  not  depend  on  n. 

2.  Using  asymptotic  arguments  we  prove  that  the  probability  a  pixel  just 
below  the  edge  passes  the  threshold  tn  >  0  is  larger  than  p(),  where 
Po  is  a  non-zero  probability  independent  of  n.  Based  on  this,  we  use 
a  concentration  argument  to  prove  that  @(n)  of  the  pixels  just  below 
the  edge  will  pass  the  threshold  with  high  probability.  See  the  formal 
statement  in  Theorem  1. 

3.  Using  symmetry  arguments  we  prove  that  the  probability  a  pixel  that 
is  t  <  Sn/2  rows5  above  the  edge  or  below  the  edge  passes  the  threshold 
is  equal.  This  is  formally  stated  in  Lemma  8. 

4.  Combining  the  outcomes  of  Steps  2  and  3  we  show  that  the  risk  is 
minimized  if  all  the  pixels  just  above  the  edge  pass  the  threshold  and 
the  probability  that  the  other  pixels  pass  the  threshold  is  as  low  as 


5The  Ith  row  of  an  image  is  the  set  of  all  pixels  of  the  form  {i,  (.). 
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possible.  If  more  zero  pixels  above  the  edge  pass  the  threshold,  then 
more  pixels  with  noise-free  value  1  will  also  pass  the  threshold,  and  this 
makes  the  bias  large.  Therefore  we  assume  that  pn.te,  the  probability 
that  a  pixel  at  distance  £  of  the  edge  passes  the  threshold,  is  equal  to 
zero  for  £  >  1.  However,  we  have  already  proven  that  for  £  —  1  the 
probability  is  larger  than  p().  Theorem  5  uses  this  fact  to  show  that  the 
risk  of  this  estimator  is  larger  than  a  constant  independent  of  n. 


Proposition  1.  Let  j*  =  \ |] .  For  any  pixel  with  coordinates  of  the  form 
( i*,j *),  there  exists  a  non-zero  constant  probability  po  such  that  for  any  Sn 
and  tn 


npo  <  ~t 


<  4Sne 


Proof.  For  notational  simplicity  we  use  i  =  i*  and  j  —  j*  in  the  proof.  We 
have 


^{dsn  2/m,j  — i)  —  °  T  tn 

=  P 


(  n2  ^  y  I xi+p,j+e  Vm+p,j-l+e  |  (xi,j  Vp,j- 1)  )  —  °  T  tn  j 

\ln  tp  J 

= p  E(4  - ff2)  -  j  E  s'.»  ^  ~ + *») 

\l>n  e,p  f>n  t  Pn  J 

^ p  E(si - ff2)  -  j  E Sf.°  -  • 

\Pn  Pn  t  Pn) 


where  s^m  —  zm+e,j-i+p-  According  to  the  Berry-Esseen  Central  Limit  The¬ 
orem  for  independent  non-identically  distributed  random  variables  [16],  we 
know  that 


>  P(G  <  -1) 


C_ 

Pn 


where  G  is  a  Gaussian  random  variable  with  mean  zero  and  bounded  standard 
deviation.  In  fact,  it  is  not  difficult  to  confirm  that 


E(G2)  =  2<t4  + 


8 a25n  -  2a 4 
(2Sn  +  I)* 
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Since  P(G  <  -1)  >  2 p0  (2 p0  is  P{G'  <  -1)  where  G'  ~  N{0,2a4))  is 
non-zero,  for  large  values  of  n  we  can  ensure  that  C/n  <  po  and  therefore 
that  P(d|  {'Ih.j-  Um.j—i )  <  <y2  +  tn)  >  po .  We  now  prove  that  even  though 
the  weights  are  correlated,  @(n)  of  the  weights  will  be  equal  to  1  with  very 
high  probability.  Define  ut  as  u>i,j- i  and  define  the  process  U  —  (ui, ...  ,un). 
Break  this  sequence  into  2 Sn  subsequences  Ui  -  (uu  ui+2Sn,ui+4sn ,  •  •  • ,  un_2sn+i) 
Each  U{  has  independent  and  identically  distributed  elements.  Therefore,  ac¬ 
cording  to  the  Hoeffding  Inequality,  we  have  P(|  eU  Uj  —  ^-E(uj)|  >  t)  < 

-t2Sn  3  ‘  H 

2e  «  " .  On  the  other  hand  we  know  that  E{u.j)  >  pa.  Therefore, 


n 


-tzS„ 


P  (  Y1  u3  <  ~f  \  ^  2e  "  "  • 


iMj-ef/t 


Finally  we  use  the  union  bound  to  obtain 


P 


(5^  Ui  ~  nP°  —  —  P  EE 

\  i  UjEl/i 


n 

**  -  2&?0  -  ~ f 


<  P  U i{uj  :  ^2  uj  -  ^po  <  -  —  }  <  4 Sne 


UjGU, 


25,. 


□ 


Define  the  set  J  =  {(i,  j)  \  j  —  {Jh(^) J}.  It  is  clear  that  |./|  —  n.  The 
following  Corrollary  to  Proposition  1  shows  that  NLM  sets  the  weights  of 
most  of  the  pixels  in  J  to  1 . 

Corollary  1.  Consider  the  image  displayed  in  Figure  2,  and  let  8n  —  0(na) 
for  a  <  1.  For  any  Sn  and  tn  >  0,  O(n)  of  the  pixels  in  J  will  pass  the 
threshold  tn  with  very  high  probability. 

Proof.  Set  t  —  n~^~  in  Proposition  1.  □ 


Remarkably  the  above  corollary  holds  in  a  very  general  setting  even  if  the 
assumptions  A1  A4  do  not  hold.  In  other  words,  NLM  in  its  most  general 
form  is  not  able  to  distinguish  between  the  pixels  right  above  the  edge  from 
the  pixels  right  below  the  edge.  This  is  due  to  the  fact  that  the  “signal  to 
noise  ratio”  in  the  ^-neighborhood  distance  estimates  is  too  low  at  the  edge 
pixels.  This  is  the  result  of  the  isotropic  neighborhoods  used  to  form  the 
weight  estimates. 
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Lemma  7.  If  \m  —  i*\  >  Sn/2  and  \ ml  —  i*|  >  8n/2,  then 

,j*  ■,  Vm  ®  T  ^n)  =  P  ( d'Sn  ( }j%  *  j  *  •  Via'  ®  "I-  ^n) 

for  any  £,  to,  m! . 

The  proof  of  this  lemma  is  obvious  and  is  skipped  here. 

Lemma  8.  For  £  <  Sn/2, 

(y»* j* )  ym,j*-e)  ^  o  T  tn)  —  {Vi-^  j  Vnij’+e)  —  a  Ftn). 

The  proof  of  this  lemma  is  also  obvious  from  symmetry  and  is  skipped 
here.  We  can  now  prove  Theorem  5,  which  provides  a  lower  bound  for  the 
risk  of  SNLM. 


Proof  of  Theorem  5.  We  derive  a  lower  bound  for  the  risk  of  SNLM  on  the 
image  displayed  in  Figure  2.  To  do  so,  we  consider  the  pixels  just  above  the 
edge  and  prove  that  the  SNLM  algorithm  has  risk  0(1)  at  these  pixels.  Since 
there  are  @(n)  of  these  pixels,  the  risk  over  the  entire  image  is  larger  than 
0(n-1). 

Consider  a  pixel  ( i*,  j *)  with  j*  =  [|].  The  risk  of  the  SNLM  is 


E 


E  E  wm,eym,£  \  >  /  Jg  (  E  E  wm,eym,e 
E  E«rn,i  /  ~  V  V  EE  wm,l  ) ) 


(24) 


Note  that  wm>e  is  independent  of  the  ym>e  according  to  the  construction  of  the 
SNLM  weights  in  (10).  Let  pnj  be  the  probability  P (we,i  —  1)  for  £  G  {j*  - 
Sn,j*  —(•)„  +  1 .... ,  j*  +  Sn } .  We  can  partition  the  row  { (i,  £)  |  1  <  i  <  n}  into 
2 Sn  +  1  subsequences  and  apply  Hoeffding  inequality  on  each  subsequence. 
We  combine  the  results  of  different  subsequences  with  the  union  bound  to 
prove  that 


<  4(Le4nA>< . 


(25) 


Define  the  event  A  as 


A  — 


npn/\  <  nQM  W£,\£ 


26 


,1.32 


Using  the  union  bound  and  (25)  we  have 

P(AC)  <  8 5%e  *nSn  . 

Any  lower  bound  on  the  bias  of  the  estimator  leads  to  a  lower  bound  on  its 
risk.  Therefore,  we  find  a  lower  bound  for  the  bias  as  follows: 

IE«  rn.djrn,? 


£  f  E  E  y  Jg 

V  EE«w  )  ~ 

/  E  E  Wm/Vm/ 


>  E 


nmS, 


E  E  wm,e 

a]  P(A)  >  E 


Aj  P(A) 

(  E  E  u>m,eym,e 
\E  uPn,e  +  n'66^ 


-  P{AC), 


\  E  nPn,e 

where  for  the  last  inequality  we  have  used  the  fact  that  the  risk  of  SNLM 
is  bounded  by  1.  Since  from  the  construction  of  SNLM  in  (10),  wm/  is 
independent  of  zm<(,  we  have 


E 


'  E  E  w 

E  nP,+  +  n()mdr 

Er<j*  nPn,e 


-  P(AC)  =  E 


-  P(AC)  > 


E  E  Wm, 13*171,1 

npn,(  +  n°-6<V)„ 

E  tcj-  nPn,£ 


~  P{AC) 

-  P(AC). 


E  nPn,e  +  n°-66Sn  71  +  2  E (<j‘  nP»,<  +  n°-666„ 

Proposition  1  proves  that  both  the  numerator  Ef<;*  "Pnj  and  the  denome- 
nator  E  nPn,e  +  n0  66Sn  are  Q(n).  Therefore,  according  to  A3,  we  can  ignore 
the  summations  E^cj*-^  nPn,e  and  Ef>j-+^  nPn,e ■  By  combining  this  fact 
with  Lemma  8,  we  obtain 


Ef< j*  nPn,e 


E  nPn,e 
>  — 


-  P(AC)  > 


E t<j*  nPn,e 


n°-66Sn  npnj*  +  2  E«j.  nPn,e  +  n°-66Sr 

Ec<j*  npn,e 


-  P(AC) 


—  P(AC). 


n  +  2  E e<j‘  nPn/  +  n°-668n 
In  the  last  inequality  we  assumed  that  pnj-  =  1.  To  find  a  lower  bound 


^  np"^  „  ,iliA  it  is  enough  to  note  that  nee* 

n+ZJZecj*  npn/+«u-e'i5„  °  n+2£f<J.„  npn,e+n'l-hbSn 

increasing  function  of  E^<j*  nPn,e  and  therefore  is  minimized  if  and  only  if 
E;<j*  nPn,e  takes  its  minimum  value.  However,  according  to  Proposition  1 
the  minimum  value  of  this  term  is  (-)(//).  Therefore,  we  have 


- ^,<i’nPn/  -  P(A‘) 

n  +  2  E/<j*  nPn,e  +  nM6n 

>  - ^^-P(A‘)  = 


Po 


npo  +  n  +  nmS„ 


Po  +  1 


(1  +  0(1)). 
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This  completes  the  proof. 


□ 


5.  Tapered  NLM  Weights 

In  this  section  we  show  that  the  upper  bound  we  provided  for  the  NLM 
in  Theorem  4  holds  in  the  more  general  setting  of  tapered  weights.  We 
now  allow  the  weights  to  be  a  smooth  function  of  the  ^-neighborhood.  We 
assume  that  the  weight  assignment  policy  satisfies  the  following  properties: 

Bl:  The  neighborhood  size  Sn  =  21og(n). 

B2:  The  weights  are  non- negative  and  bounded,  i.e. ,  0  <  wrnj  <  a. 

B3:  If  (I2 (Xij ,  xm<e)  —  0,  then  the  assigned  weight  satisfies  E(tuiJ  (m.  £))  >  c, 
for  some  constant  c  independent  of  n. 

B4:  If  d?(xij,xmte)  —  1.  then  E(uiy(m, f))  —  O  If  shall  be  empha¬ 

sized  that  slower  decay  rate  in  this  expectation,  results  in  slower  decay 
in  the  rate  of  the  NLM  algorithm. 


Theorem  7.  If  the  weight  assignment  policy  in  NLM  satisfies  properties 
Bl  Bf,  then 


sup  R(f,  fN)  =  O 

feHa(C) 


Proof.  Consider  the  four  partitions  Sj  S4  defined  in  the  proof  of  Theorem 
4.  Our  goal  is  to  obtain  an  upper  bound  for  the  risk  of  the  pixels  in  each 
region.  The  risk  of  the  pixels  in  S2  and  S3  will  be  bounded  by  the  strategy 
we  employed  in  Theorem  4.  Here,  we  just  explain  how  we  bound  the  risk  of 
the  pixels  in  Si  and  S4.  Since  the  proof  for  S4  is  the  same  as  the  proof  for 
Si,  we  consider  just  Sj.  Let  (i,  j)  €  Si.  Therefore  xtj  =  0  and 


E  {xij-fZf)2  =  E 


=  E 


'  J2  U>m,eym,e 

.  X)  wm,e 

y. 


E 


X  7 

\\  X  wm,e 


E 


(Eff  m,  ezm,e 

V  X  wm,e  , 

Eff  m  ,ezm,e\ 

X  wm,e  J 
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To  obtain  an  upper  bound  for  the  risk,  we  will  find  upper  bounds  for  the  last 
three  terms  in  (26).  Lemmas  9  and  10  below  summarize  the  upper  bounds. 


Lemma  9.  Let  tvm.e  be  the  weights  of  NLM  satisfying  Bl  Bf.  Then 


E  =0m 

V  Lv  /  W 

Proof.  Define  S j  as  the  set  of  the  indices  of  the  pixels  whose  noise- free  value 
is  neither  0  nor  1,  and  plug  in  the  actual  values  of  x,n  f  to  obtain 


E 


E(m,£)eSiUS4  Wm,eXm,e  +  '^2(m,e)eS2VSa\Sf  Wm,eXm,e  +  Yl(m,e)eSf  Wm,eXm,e 

E(m,r)eSius4  Wm’e  +  E(m,£)es2us3\6y  Wtn’e  EMe5/  wm,e 

,  _  _  _  o 

E(m,r)es4  WmL  +  E( m,e)es3\sf  u’m,e  +  J2(m,e)esf  wm,exm,t 


<  E 


<  E 


E(m,r)eSiUS4  Wm,i  +  E(m,£)es2us3  Wm>e  +  E, 

E(m/)es4  wm,e  +  E(m,£)es3  a  +  ^nQ 
E(m,£)eSiU54  Wm’e  +  E(m/)es3  Q 


m,e)eSf  Wm’t 


(27) 


To  derive  the  last  inequality  we  use  the  following  facts,  which  are  easy  to 
check: 


1. 

2. 

3. 


( ^Mes‘  "w+£<’y \ 2  is  an  increasin  func_ 

\  Z-(m,f)€S1US4  u’m,£  +  Z^(m,£)€S2US3  tom,£+2^(m,£)€S^  wm,t  J 

tion  of  E (m,e)es3\Sf  wm,e • 

(E(m,£)es41um^+E(m,oes3\sf  Wm.e+T,(m.e)es  r  wm,eXm,e \  .  r 

- r-~ — - — - - — - -  is  a  decreasing  func- 

Z-(m,f)6S1US4  wm,l  +  Z^(m,£)€S2US3  ™m,£  / 

tion  of  E(m,£)eS2\5/  w"‘,e- 
| Sf  |  <  2 n,  i.e.,  S'/  contains  at  most  2 n  pixels. 


Our  next  claim  is  that  E(m  e)es4  wm,e  and  E(m  e)eSi  wm,e  concentrate 
around  their  means.  We  establish  this  in  a  manner  very  similar  to  the  proof 
of  Theorem  5.  We  first  break  the  E(m  e)es4  wm,e  into  (4 Sn  +  2)2  subsequences 
such  that  each  subsequece  contains  only  independent  random  variables.  In 
other  words  if  xm/  is  in  one  summation,  then  no  other  element  of  Cfsf+2 
will  lie  in  the  summation.  Therefore,  for  each  summation  we  can  apply  the 
Hoeffding  inequality.  Finally,  we  use  the  union  bound  as  explained  in  the 
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proof  of  Theorem  5  to  show  that 


P 


55  Wm,e  -  55  E(^W) 
(m,£)eSi  (m,e)eSi 


>tj<  2(44  +  2)2e(4in+2)4,E("*'£)esi 
_  -2 12 _ 

<  2(44  +  2)2e(46"+2,'<E(-n,0€s1  «**>. 
It  is  straightforward  to  prove  that  by  setting  t  to  32cm  log2'5  (n),  we  have 


P 


55  Wm’e  -  55  E(uw) 

( m,C)eS4  (m,£)eSi 


p 


p 


I  55  WmJ  -  55  E(^w) 

y  (m,£)eSi  (m,£)eSi 

/ 

\ 

>  32cm  log2'5(n) 

) 

|<o( 

I  55  w"ht  -  55  E(t,j™/) 

y  ( m,l)eS4  (m,£)sSi 

\ 

>  32cm  log2'5(n) 

) 

)<o( 

'4 


n° 


SI 


nn 


•  (28) 


..2.5 


n 


Define  the  event  F  as  1 1  X)(m,*)esi  w™-t  ~  ^2(m,e)eSi  E( )  <  32cm  log 

n  {l  E(m,*)664  Wm’e  -  Efm^eS!  E(«w)l  <  32an  log2'5  n} .  It  is  clear  from 
(28)  that 

P(FC)  =  O  (^j  .  (29) 

Using  (27),  (28),  and  (29)  we  have 

E(m,c)es4  Wm>e  +  Y^(m,£)eS3\Sf  Q  +  2na 


E 


E(m,C)eSiUS4  Wm<e  +  ^2(m,£)es3\sf  a 


<  E 


E(m,£)eS4  wrn,e  +  Yl(m,£)eS3\Sf  a  +  2«.C* 
E(m,£)eSiUS4  Wm,t  +  ^2(m,£)eS3\Sf  Q 


F  P(F)  +  P(FC 


5  °u>- 

The  last  inequality  is  due  to  Assumptions  B3  and  B4.  This  completes  the 
proof  of  the  lemma.  □ 


30 


Lemma  10.  Let  wm/  be  the  weights  of  NLM  with  Sn  =  log(n).  Also  assume 
that  the  weights  are  set  according  to  B1--BA.  We  then  have 

E  =  Q  /log2(n)\ 


V  E  wm,e  J  \  n2  J  ' 

Proof.  We  first  condition  on  the  event  F  introduced  in  the  proof  of  Lemma 

9. 

TP  (  (T,^m,eZmA2\ 

) 

<  E  ^  F^jP(F)  +  P(Fc) 

<  e(  (— - ^  wm,eZm,e —  \  F \  p(F)  +  p/Fc) 

“  I  VEE(tiw)-  32cm  log2-5 (n)J  {  ’  ’ 


<  E 


<  O 


( _ E  WmjZmJ _ 

\E)E(wm>*)  -  32cm  log2'5  (n) 

log2(n)\ 


The  last  inequality  is  due  to  the  fact  that 


E  I  wm,eZm,t  Cfj 

V  (m,QeSi4 


=  E  ^  ^2  Wm,eZm!eWm'/'Zm'/' 

\(m,£)eSi4  (m',C)GSi4  / 

'y  ^  'y  ^  E {wm  (Zm  (Wmi  (i zmi  (i  |  Cff )  0[n  Sn). 


(m,e)es i4  (m' ,e')ec™n. 


Therefore, 


E(m,Qe523  wm,ezm,e\  \  <  ^  /  5% 
E(m,«)e5i4  Wm>*  J  J  V'2 
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Using  the  bounds  derived  in  Lemmas  9  and  10,  we  can  complete  the  proof 
of  the  main  theorem: 


feHa(C) 


*  j 


<  log2(n)(|51|  +  |g4|)  ,  \S2\  +  \S3\  <Q 


'log  (n)' 


TV 


rr 


n 


□ 


6.  Discussion 

We  have  provided  the  first  asymptotic  result  on  the  risk  analysis  of  the 
nonlocal  means  (NLM)  algorithm  on  smooth  images  with  sharp  edges.  In 
contrast  to  most  other  filtering  approaches,  NLM  does  not  consider  the  spa¬ 
tial  vicinity  of  the  pixels  as  a  feature  for  setting  the  weights.  Instead,  it 
exploits  more  global  features,  which  leads  to  improved  performance. 

In  spite  of  this  success,  we  have  shown  that  the  performance  of  NLM  is 
within  a  logarithmic  factor  of  the  performance  of  the  wavelet  thresholding 
and  still  significantly  below  the  optimal  achievable  rate.  This  is  due  to  the 
fact  that  the  isotropic  nature  of  the  NLM  neighborhoods  does  not  allow  the 
algorithm  to  discriminate  the  pixels  that  are  close  to  but  below  the  edge  from 
the  pixels  that  are  close  to  but  above  the  edge.  This  leads  to  a  blurring  effect 
that  results  in  high  bias  along  the  edge.  Exploring  the  performance  of  NLM 
with  anisotropic  neighborhoods  may  address  this  issue  and  is  left  for  future 
research. 
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